De nos jours, les applications s’adaptent aux besoins de ces utilisateurs ergonomie, performances et faisabilités.. Pour garantir la fidélité des consommateurs. # Présentation de la solution proposée : Dans le cadre de notre Tp nous disposant d’un event log d’une application de paiement bancaire dans le but d’identifier le processus le plus frequament utilisé au cour de l’utilisation de l’application
Process Mining rend l’analyse de processus à nouveau pertinente. Process Mining utilise les données générées des systèmes. Il peut générer automatiquement des modèles de processus réels avec des fréquences et des mesures de performance . De plus, les modèles de processus permettent d’identifier facilement tous les problèmes de conformité à la fois. # Résultat :
library(bupaR)
## Warning: package 'bupaR' was built under R version 3.5.2
## Loading required package: edeaR
## Warning: package 'edeaR' was built under R version 3.5.2
## Loading required package: eventdataR
## Warning: package 'eventdataR' was built under R version 3.5.2
## Loading required package: processmapR
## Warning: package 'processmapR' was built under R version 3.5.2
## Loading required package: xesreadR
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'xesreadR'
## Loading required package: processmonitR
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'processmonitR'
## Loading required package: petrinetR
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'petrinetR'
##
## Attaching package: 'bupaR'
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:utils':
##
## timestamp
library(edeaR)
library(processmapR)
library(eventdataR)
#install.packages('tidyverse')
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.2
## -- Attaching packages ------------------------------------------------------ tidyverse 1.2.1 --
## v ggplot2 3.1.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.7
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.3.1 v forcats 0.3.0
## Warning: package 'readr' was built under R version 3.5.2
## -- Conflicts --------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks bupaR::filter(), stats::filter()
## x dplyr::lag() masks stats::lag()
library(readr)
library(tidyverse)
library(DiagrammeR)
## Warning: package 'DiagrammeR' was built under R version 3.5.2
library(ggplot2)
library(stringr)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
credit<-read.csv(file=file.choose())
echant<-credit[1:1000,]
library("bupaR")
credit$starttimestamp = as.POSIXct(credit$Start_Timestamp,tz="" ,
format = "%Y/%m/%d %H:%M:%S.%OS")
credit$endtimestamp = as.POSIXct(credit$Complete_Timestamp,
format = "%Y/%m/%d %H:%M:%S.%OS")
# remove blanks from var names
names(credit) <- str_replace_all(names(credit), c(" " = "_" , "," = "" ))
events <- bupaR::activities_to_eventlog(
credit,
case_id = 'Case_ID',
activity_id = 'Activity',
resource_id = 'Resource',
timestamps = c('starttimestamp', 'endtimestamp'))
summary(events)
## Number of events: 800000
## Number of cases: 57165
## Number of traces: 5256
## Number of distinct activities: 26
## Average trace length: 13.99458
##
## Start eventlog: NA
## End eventlog: NA
## Case_ID Activity Resource
## Length:800000 : 73546 User_1 : 98752
## Class :character O_Create Offer : 56048 : 73546
## Mode :character O_Created : 56048 User_49: 15556
## O_Sent (mail and online): 51876 User_29: 13864
## W_Validate application : 50960 User_3 : 13610
## A_Validating : 50026 User_10: 13388
## (Other) :461496 (Other):571284
## Start_Timestamp Complete_Timestamp
## : 73546 : 73546
## 2016/01/08 19:56:43.212: 4 2016/01/08 19:56:43.212: 4
## 2016/01/29 09:10:58.778: 4 2016/01/29 09:10:58.778: 4
## 2016/03/02 15:15:40.745: 4 2016/03/02 15:15:40.745: 4
## 2016/07/11 15:17:14.450: 4 2016/06/16 13:04:35.404: 4
## 2016/07/15 13:09:09.433: 4 2016/07/11 15:17:14.450: 4
## (Other) :726434 (Other) :726434
## Variant Variant_index X.case._ApplicationType
## : 73546 Min. : 1.0 : 73546
## Variant 1: 55176 1st Qu.: 7.0 Limit raise: 72788
## Variant 2: 39140 Median : 25.0 New credit :653666
## Variant 3: 29370 Mean : 390.3
## Variant 5: 21756 3rd Qu.: 257.0
## Variant 8: 20016 Max. :3159.0
## (Other) :560996 NA's :73546
## X.case._creditGoal X.case._RequestedAmount Accepted
## Car :233912 Min. : 0 : 73546
## Home improvement :190592 1st Qu.: 6000 false: 16718
## Existing credit takeover:142022 Median : 12000 true : 39330
## : 73546 Mean : 15618 NA's :670406
## Unknown : 64138 3rd Qu.: 20000
## Not speficied : 28408 Max. :450000
## (Other) : 67382 NA's :73546
## Action CreditScore EventID
## : 73546 Min. : 0.0 : 73546
## Created : 96832 1st Qu.: 0.0 Application_1000158214: 2
## Deleted : 55582 Median : 0.0 Application_1000311556: 2
## Obtained :109138 Mean : 319.9 Application_1000339879: 2
## statechange:464902 3rd Qu.: 851.0 Application_100034150 : 2
## Max. :1142.0 Application_1000557783: 2
## NA's :743952 (Other) :726444
## EventOrigin FirstWithdrawalAmount MonthlyCost
## : 73546 Min. : 0 Min. : 43.0
## Application:308920 1st Qu.: 0 1st Qu.: 150.0
## Offer :252814 Median : 5000 Median : 232.8
## Workflow :164720 Mean : 7681 Mean : 273.9
## 3rd Qu.:10996 3rd Qu.: 340.8
## Max. :75000 Max. :6673.8
## NA's :743952 NA's :743952
## NumberOfTerms OfferID OfferedAmount
## Min. : 5 : 73546 Min. : 5000
## 1st Qu.: 56 Offer_1000226917: 8 1st Qu.: 8000
## Median : 74 Offer_1000329580: 8 Median :15000
## Mean : 82 Offer_1000373613: 8 Mean :17820
## 3rd Qu.:120 Offer_1000572979: 8 3rd Qu.:24000
## Max. :180 (Other) :196734 Max. :75000
## NA's :743952 NA's :529688 NA's :743952
## Selected lifecycle.transition activity_instance_id
## : 73546 : 73546 Length:800000
## false: 27830 complete:617316 Class :character
## true : 28218 start :109138 Mode :character
## NA's :670406
##
##
##
## lifecycle_id timestamp .order
## endtimestamp :400000 Min. :2016-01-01 10:51:15 Min. :1e+00
## starttimestamp:400000 1st Qu.:2016-03-19 16:12:32 1st Qu.:2e+05
## Median :2016-06-07 10:30:24 Median :4e+05
## Mean :2016-05-28 15:59:38 Mean :4e+05
## 3rd Qu.:2016-08-02 20:04:28 3rd Qu.:6e+05
## Max. :2017-01-26 10:11:10 Max. :8e+05
## NA's :74194
-> on a 800000 events, 57165 cas possible et 5256 traces. ### Fréquence d’activités
events %>%
activity_frequency(level = "activity")
## # A tibble: 26 x 3
## Activity absolute relative
## <fct> <int> <dbl>
## 1 "" 36773 0.0919
## 2 A_Accepted 20392 0.0510
## 3 A_Cancelled 6798 0.0170
## 4 A_Complete 20311 0.0508
## 5 A_Concept 20392 0.0510
## 6 A_Create Application 20392 0.0510
## 7 A_Denied 2319 0.00580
## 8 A_Incomplete 14335 0.0358
## 9 A_Pending 11275 0.0282
## 10 A_Submitted 13233 0.0331
## # ... with 16 more rows
events %>%
activity_frequency(level = "activity") %>%
plot()
-> Les activitées les plus fréquentes O_Created , O_Create offer , O_sent(mail an online)
events %>%
filter_activity_presence(activities = c('A_Cancelled')) %>%
activity_frequency(level = "activity")
## # A tibble: 21 x 3
## Activity absolute relative
## <fct> <int> <dbl>
## 1 A_Accepted 6798 0.0724
## 2 A_Cancelled 6798 0.0724
## 3 A_Complete 6738 0.0718
## 4 A_Concept 6798 0.0724
## 5 A_Create Application 6798 0.0724
## 6 A_Incomplete 933 0.00994
## 7 A_Submitted 4939 0.0526
## 8 A_Validating 997 0.0106
## 9 O_Cancelled 9121 0.0972
## 10 O_Create Offer 9121 0.0972
## # ... with 11 more rows
plt<-events %>%
filter_activity_presence(activities = c('A_Cancelled')) %>%
activity_frequency(level = "activity")
plot(plt)
-> Les activités les plus présentes : O_Created O_Create offre , O_Cancelled ### Le graphe de la carte de processus
library(DiagrammeRsvg)
## Warning: package 'DiagrammeRsvg' was built under R version 3.5.2
library(rsvg)
## Warning: package 'rsvg' was built under R version 3.5.2
events %>%
filter_activity_frequency(percentage = 1.0) %>% # show only most frequent activities
filter_trace_frequency(percentage = .80) %>% # show only the most frequent traces
process_map(render = T)
## Warning: Prefixing `UQ()` with the rlang namespace is deprecated as of rlang 0.3.0.
## Please use the non-prefixed form or `!!` instead.
##
## # Bad:
## rlang::expr(mean(rlang::UQ(var) * 100))
##
## # Ok:
## rlang::expr(mean(UQ(var) * 100))
##
## # Good:
## rlang::expr(mean(!!var * 100))
##
## This warning is displayed once per session.
On constate une liaison entre les activités qui nous mène a identifier le processus le plus utilisé par le client.
library(DiagrammeRsvg)
library(rsvg)
events %>%
filter_activity_frequency(percentage = 1.0) %>% # show only most frequent activities
filter_trace_frequency(percentage = .80) %>% # show only the most frequent traces
process_map(performance(mean, "mins"),
render = T)
# precedent matrix ####
precedence_matrix <- events %>%
filter_activity_frequency(percentage = 1.0) %>% # show only most frequent activities
filter_trace_frequency(percentage = .80) %>% # show only the most frequent traces
precedence_matrix() %>%
plot()
plot(precedence_matrix)
Une autre methodes pour identitfier les activités et opérations qui son utilisé les uns aprés les autres ## Trace d’activité
# trace explorer
trace_explorer <- events %>%
trace_explorer(coverage = 0.7)
plot(trace_explorer)
events %>%
filter_activity_frequency(percentage = 1.0) %>% # show only most frequent activities
filter_trace_frequency(percentage = .80) %>% # show only the most frequent traces
group_by(X.case._ApplicationType) %>%
throughput_time('log', units = 'hours')
## # A tibble: 3 x 10
## X.case._Applica~ min q1 median mean q3 max st_dev iqr
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 New credit 0.110 281. 526. 535. 762. 1837. 4 NA
## 2 "" NA NA NA NaN NA NA 36773 NA
## 3 Limit raise 0.0597 219. 326. 403. 593. 1197. 235. 374.
## # ... with 1 more variable: NA. <dbl>
L’activité la plus commune lors de l’utilisation de l’application est l’extraction d’un crédit bancaire
plott<-events %>%
filter_activity_frequency(percentage = 1.0) %>% # show only most frequent activities
filter_trace_frequency(percentage = .80) %>% # show only the most frequent traces
group_by(X.case._ApplicationType) %>%
throughput_time('log', units = 'hours')
plot(plott)
## Warning: Removed 36777 rows containing non-finite values (stat_boxplot).
events %>%
filter_trace_frequency(percentage = .80) %>% # show only the most frequent traces
group_by(X.case._creditGoal) %>%
throughput_time('log', units = 'hours')
## # A tibble: 14 x 10
## X.case._creditG~ min q1 median mean q3 max st_dev iqr
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Existing credit~ 1.19 305. 529. 548. 763. 1697. 2 NA
## 2 Home improvement 0.178 290. 500. 531. 762. 1837. 1 NA
## 3 Car 0.144 242. 450. 506. 762. 1828. 1 NA
## 4 "" NA NA NA NaN NA NA 36773 NA
## 5 Not speficied 0.595 334. 733. 581. 779. 1442. 279. 446.
## 6 Caravan / Camper 56.2 213. 355. 450. 745. 1334. 273. 532.
## 7 Unknown 0.0597 232. 338. 432. 734. 1314. 248. 501.
## 8 Motorcycle 57.8 230. 438. 499. 764. 1338. 285. 534.
## 9 Business goal 233. 310. 660. 553. 756. 851. 245. 446.
## 10 Boat 55.0 267. 428. 517. 755. 1378. 282. 488.
## 11 Extra spending ~ 43.2 265. 443. 503. 756. 1188. 262. 491.
## 12 Remaining debt ~ 0.62 345. 735. 617. 782. 1669. 317. 437.
## 13 Tax payments 51.4 288. 432. 490. 742. 1165. 266. 455.
## 14 Debt restructur~ 732. 732. 732. 732. 732. 732. NA 0
## # ... with 1 more variable: NA. <dbl>
On constate que les crédits on servie essentiellement a l’améloration d’une maison ou l’achat d’une voiture ou payement d’un crédit pour maison .
plot2<-events %>%
filter_trace_frequency(percentage = .80) %>% # show only the most frequent traces
group_by(X.case._creditGoal) %>%
throughput_time('log', units = 'hours')
plot(plot2)
## Warning: Removed 36777 rows containing non-finite values (stat_boxplot).
## Interprétations: